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Abstract 

Knowledge  of  the  three-dimensional  (3D)  structures 
of  protein  complexes  provides  a  fundamental 
understanding  of  biological  systems,  as  well  as  novel 
insights  for  antimicrobial  drug  and  vaccine  design. 
Protein-protein  docking  is  used  to  predict  the  3D 
structures  of  protein  complexes  from  their  components  in 
silico.  In  this  study,  we  developed  a  protein-protein 
docking  pipeline  (PPDP)  that  integrates  a  variety  of 
state-of-the-art  protein  docking  and  structure  prediction 
techniques,  providing  a  systematic  platform  to  predict 
large-scale  protein-protein  interactions  (PPIs).  The 
PPDP  is  deployed  on  high  performance  computing  (HPC) 
clusters,  thus  enabling  Department  of  Defense  scientists 
to  harness  HPC  resources  to  investigate  the  PPIs  of 
biowarfare  agents  for  the  development  of 
countermeasures.  We  applied  the  PPDP  to  investigate 
the  binding  interactions  of  Yersinia  effector  proteins  with 
their  chaperone  and  the  underlying  specificity  of 
chaperone/effector  interactions. 

1.  Introduction 

Protein-protein  interactions  (PPIs)  underlie  many 
basic  biological  processes.  Knowledge  of  the  three- 
dimensional  structures  of  protein-protein  complexes  is 
of  fundamental  importance  for  the  understanding  of 
biological  systems,  e.g.,  in  host-pathogen  interaction 
networks,  and  thus  is  extremely  valuable  to 
antimicrobial  drug  and  vaccine  design.  Rapid  advances 
in  gene  sequencing  technologies  and  genome-wide 
protein  structure  determination  have  made  in  silico 
protein-protein  docking  a  promising  approach  for  the 
systematic  determination  and  structural  characterization 
of  PPIs  at  the  atomic  level  (Ritchie,  2008). 

Protein-protein  docking  algorithms  are  designed  to 
predict  the  three-dimensional  (3D)  structure  of  a  protein- 
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protein  binding  complex  from  its  component  structures 
in  silico.  A  number  of  protein  docking  programs  have 
been  developed  in  the  past  decades.  As  representatives, 
ZDOCK  and  RosettaDock  are  two  widely-used 
programs.  ZDOCK  uses  fast  Fourier  transform  (FFT)  to 
globally  search  rigid-body  transformations  of  two 
proteins,  and  is  amenable  to  large-scale  decoy 
generation  (Chen,  2003).  In  contrast,  RosettaDock  uses 
Monte  Carlo-based  searching  and  local  perturbations 
with  structural  refinement  (Gray,  2003).  Although  it  has 
been  successfully  applied  for  the  prediction  of  protein 
complexes  that  agree  well  with  experiments,  major 
challenges  still  remain  with  protein  docking,  especially 
as  it  relates  to  flexible  proteins  and  induced 
conformational  changes  upon  binding.  To  improve 
docking  accuracy,  a  variety  of  post-processing 
approaches  have  been  developed.  These  typically 
include  clustering,  hotspot  filtering,  re-scoring,  and 
structural  refinement  with  minimization  and/or 
molecular  dynamics  simulations  (Andrusier,  2008). 

An  important  application  of  protein-protein  docking 
is  to  predict  large-scale  PPIs  at  the  genomic  level 
(Vajda,  2002).  In  this  case,  a  large  number  of  proteins 
are  cross-docked  and  the  true  interacting  partners  are 
identified  based  on  the  docked  scores.  This  poses  a 
more  challenging  problem  with  protein  docking  as  it 
requires  not  only  the  prediction  of  the  correct  binding  of 
two  proteins  but  also  the  ability  to  distinguish  true 
binding  partners  from  a  large  pool  of  non-binders.  In 
practice,  this  also  demands  tremendous  computing 
resources  and  data  mining  processes.  Although  several 
public  Web  servers  of  protein  docking  are  available,  they 
are  generally  restricted  to  a  single  protein-protein 
docking  simulation  and  not  suitable  for  large-scale  PPI 
prediction  studies. 

In  this  study,  we  developed  a  protein-protein 
docking  pipeline  by  integrating  various  state-of-the-art 
protein  docking  programs  and  structural  refinement 
techniques  for  systematic  prediction  of  large-scale 
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protein  complexes.  Two  docking  modules,  based  on 
ZDOCK  and  RosettaDock,  were  designed  in 
combination  with  a  number  of  post-processing 
approaches  to  improve  the  docking  accuracy.  The 
pipeline  was  automated  and  deployed  on  high 
performance  computing  (HPC)  clusters  with  a  Web- 
based  graphical  user  interface  (GUI),  providing  a  useful 
tool  for  scientists  in  the  Department  of  Defense  (DoD)  to 
harness  HPC  resources  to  study  the  PPIs  of  biowarfare 
agents  for  the  development  of  countermeasures. 

2.  Methodology 

The  Protein-Protein  Docking  Pipeline  (PPDP) 
integrates  freely-downloadable  software  components 
from  various  academic  and  government  research 
laboratories.  The  pipeline  comprises  two  basic  docking 
modules:  1)  a  ZDOCK-based  docking  module  (ZDM)  that 
generates  a  large  number  of  “low-resolution”  decoys  of 
binding  complexes  and  2)  a  RosettaDock-based  docking 
module  (RDM)  for  “high-resolution”  docking  with 
flexible  local  perturbations  and  structural  refinement.  At 
the  post-processing  stage,  a  number  of  approaches  are 
integrated  with  each  docking  module.  These  include 
biochemical  information  filtering,  pair-wise  root  mean 
square  deviation  calculations  and  clustering  using  the 
MMTSB  tool  (Feig,  2004),  energy  minimization  using  the 
Amber  molecular  modeling  package  (Case,  2008),  and  re¬ 
scoring  using  a  variety  of  objective  functions,  i.e., 
ZRANK  (Pierce,  2007),  DFIRE  (Zhang,  2004),  EMPIRE 
(Liang,  2007),  and  MM-PB/GBSA  (Onufriev,  2000). 

As  an  integrated  pipeline,  the  PPDP  provides  a 
systematic  platform  to  design  and  optimize  different 
docking  protocols  for  the  applications  of  PPI  studies.  An 
efficient  hierarchical  docking  module  is  implemented, 
which  generates  a  large  number  of  decoys  with  the  ZDM, 
followed  by  subsequent  local  perturbation  and  the 
structural  refinement  of  top- scoring  models  with  the 
RDM.  In  addition,  the  PPDP  can  be  integrated  with  other 
Biotechnology  HPC  Software  Application  Institute 
(BHSAI)  pipeline  tools,  such  as  the  Protein  Structure 
Prediction  Pipeline  (PSPP)  for  homology  model 
prediction  (Lee,  2009),  or  the  Automated  Protein 
Ensemble  Generator  (APEG)  for  ensemble  generation.  A 
workflow  of  the  integrated  PPDP  is  shown  in  Figure  1 . 
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Figure  1.  A  workflow  of  the  Protein-Protein  Docking 
Pipeline  (PPDP) 


The  PPDP  is  specifically  designed  for  an  HPC 
system  to  study  large-scale  PPI  networks.  Different  from 
publicly  available  servers,  the  PPDP  can  process  a  large 
dataset  of  protein  complexes  in  parallel.  The  inputs  for 
the  PPDP  are  two  lists  of  proteins  (receptor  and  ligand), 
and  the  output  is  a  score  matrix  of  top-ranked  models  of 
each  docking  partner.  The  PPDP  utilizes  the  native  “Job 
Array”  feature  of  the  queuing  system  to  efficiently 
manage  multiple  job  submissions  and  execution. 
Currently,  the  PPDP  is  deployed  on  the  HPC  cluster 
“MANA”  at  the  Maui  HPC  Center  and  “MJM”  at  the  US 
Army  Research  Laboratory  (ARL)  DoD  Supercomputing 
Resource  Center  (DSRC)  using  the  PBS  queuing  system. 
In  general,  it  takes  ~1  to  2  hours  to  dock  one  protein 
complex  with  the  single-core  ZDOCK,  whereas  the 
single-core  RosettaDock  needs  more  than  5  days  to  finish 
one  protein-docking  task.  Therefore,  we  assessed  the 
performance  of  the  parallelized  RosettaDock_MPI  on 
HPC  clusters.  Figure  2  shows  the  speed-up  curve  tested 
on  the  MJM  cluster.  It  can  be  seen  that  the  parallelized 
RosettaDock  achieves  near-linear  speed-up  up  to  64 
cores. 


Figure  2.  The  speed-up  curve  for  the  PPDP 
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To  make  the  PPDP  accessible  to  a  broader 
community  of  scientists  in  DoD,  we  developed  a  Web- 
based,  user-friendly  interface.  The  GUI  uses  the  User 
Interface  Toolkit  to  allow  authorized  personnel  to  access 
HPC  Modernization  Program  (HPCMP)  resources  by 
verifying  their  credentials  via  Secur ID -based  Kerberos 
authentication  tools.  As  shown  in  Figure  3,  the  GUI 
makes  it  easy  for  users  to  specify  job -specific 
parameters,  submit  jobs,  check  the  status  of  jobs,  and 
analyze  the  results.  Users  can  inspect  the  results 
through  integrated  visualization  tools  or  download  the 
results  of  the  predicted  protein  complex  by  various 
criteria  for  further  processes. 


Figure  3.  The  Web-based  graphical  user  interface  of  the 
PPDP 


3.  Results  and  Discussion 

We  applied  the  PPDP  program  to  study  the 
interaction  of  Yersinia  virulence  effector  proteins  with 
their  cognate  chaperone.  The  type  III  secretion  system 
(T3SS)  used  by  Yersinia  pestis  and  many  other  Gram¬ 
negative  bacteria  plays  an  important  role  in  the 
pathogenic  invasion  of  the  host  cell  by  delivering  effector 
proteins  directly  into  the  cytosol  of  the  host.  The  delivery 
mechanism  depends  on  a  specific  interaction  of  the 
effector  protein  unfolded  and  bound  with  its  cognate 
chaperone  in  the  bacterial  cytosol  (Comelis,  2002).  The 
interplay  of  disordered  effector  proteins  and  chaperones  is 
a  challenging  problem  to  study,  both  experimentally  and 
computationally.  We  used  the  PPDP  to  investigate  the 
binding  interaction  between  the  disordered  Yersinia 
effector  protein  YopE  and  its  cognate  chaperone  SycE 
(Rodgers,  2008).  Starting  from  compactly  folded  de  novo 
models  generated  by  the  PSPP  (Lee,  2009),  a  large 
ensemble  of  unfolded  conformations  of  the  chaperone 
binding  domain  of  YopE  (YopECBD)  was  generated  with 


replica-exchange  MD  simulations  (REMD).  A  multi-step 
protein  docking  protocol  that  combined  ensemble 
docking,  clustering,  and  structural  refinement  with  the 
PPDP  was  used  to  dock  the  effector  YopECBD  to  its 
cognate  SycE.  The  predicted  YopECBD/SycE  binding 
complex  was  in  good  agreement  with  the  experimental 
X-ray  crystal  structure  (Figure  4A).  The  results  indicated 
that  our  REMD-based  protein  ensemble  docking  strategy 
is  able  to  sufficiently  sample  the  unfolded  effector 
conformations  and  subsequently  predict  the  disordered 
effector/chaperone  binding  complex.  This  is  the  first 
theoretical  study  at  the  atomic  level  of  unfolded  bacterial 
effector/chaperone  interactions  (Hu,  2009). 

A  B 


Figure  4.  A:  Interaction  of  Yersinia  pestis  virulence  protein 
YopH  (stick;  yellow  and  magenta)  with  its  cognate  chaperone 
SycH  (ribbon;  yellow  and  cyan).  The  predicted  binding  mode 

of  YopH  (magenta)  using  PPDP  agreed  well  with  the 
experimental  structure  of  the  YopH/SycH  complex  (yellow). 

B:  Predicted  YopH/SycH  complexes  using  PPDP.  The 
effector  YopH  is  shown  in  magenta  (stick)  and  the  chaperone 
SycH  is  shown  in  green  and  cyan  (ribbon). 

We  further  applied  the  REMD-based  ensemble 
docking  approach  with  the  PPDP  to  predict  the  binding 
interaction  of  Y.  pestis  virulence  effector  YopH  with  its 
chaperone  SycH.  Despite  extensive  experimental  studies 
of  the  protein,  the  crystal  structure  of  the  YopH/SycH 
complex  has  not  been  determined.  The  predicted  binding 
complex  of  YopH/SycH  by  the  PPDP,  consistent  with  the 
experimental  study,  showed  that  the  chaperone  binding 
domain  of  effector  YopH  adopts  a  similar  binding  mode 
as  YopE,  which  wraps  around  its  chaperone  SycH  dimer 
in  an  extended,  non-globular  form  (Figure  4B).  The  most 
significant  variation  was  found  at  the  N-terminal  region, 
which  exhibited  large  fluctuations  in  the  predicted 
complex.  This  reiterated  the  challenging  issue  of  how  to 
efficiently  deal  with  protein  flexibility  and  conformational 
changes  to  improve  docking  performance.  On  the  other 
hand,  the  result  may  also  imply  that  the  binding  complex 
of  YopH/SycH  is  less  stable  compared  with  YopE/SycH 
due  to  the  high  flexibility  of  the  chaperone  binding 
domain  of  YopH. 

An  interesting  aspect  of  the  bacterial 
effector/chaperone  interaction  is  that,  although 
experiments  have  revealed  that  the  structure  of  the 
effector  and  chaperone  as  well  as  the  structural  binding 
motif  involved  in  the  interaction  are  very  similar,  the 
effector  specifically  binds  to  its  cognate  chaperone  with 
high  selectivity  (Lilic,  2006).  Here,  we  attempted  to 
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investigate  the  structural  basis  of  the  binding  specificity 
of  effector/chaperone  interactions  using  the  PPDP.  A 
challenging  question  is  whether  the  scores  from  protein- 
protein  docking  are  able  to  discriminate  the  true  binding 
partner  from  a  large  number  of  non-binders.  To  this  end, 
we  benchmarked  eight  effector/chaperone  binding 
complexes  and  performed  protein  docking  using  the 
PPDP.  The  structures  of  the  chaperones  and  the  native 
effector  were  selected  from  their  experimental  complexes 
in  the  PDB  (Berman,  2002).  To  simplify  the  docking 
problem,  only  the  monomer  chaperone/effector 
interaction  was  considered.  Therefore,  two  segments  of 
the  effector  (namely,  effector  a  and  effector  b)  were 
docked  to  a  chaperone  separately.  Figure  5  shows 
preliminary  results  of  cross-docking  using  the  ZDOCK 
module  of  PPDP.  The  predicted  binding  complexes  were 
scored  by  ZRANK.  Analysis  of  the  top-scored  models 
showed  that  the  effectors  bound  to  the  chaperones  in  a 
similar  manner  with  the  conserved  /?-motif  interactions. 
This  is  consistent  with  experimental  observations, 
indicating  that  the  PPDP  was  able  to  generate  the  correct 
conformations  of  effector/chaperone  binding.  However, 
among  these  eight  binding  complexes,  only  four  of  them 
were  correctly  predicted  as  true  binders  by  ZRANK 
scores  (YopE/SycE,  SptP/SicP,  Sic  A/In  vB,  and 
YscE/YscD).  The  remaining  predicted  complexes 
(YopNa/SycN,  YopNbYscB,  YscM2/SycH,  and 
YopD/SycD)  were  scored  with  weak  binding  affinities 
compared  with  other  non-binders.  Further  studies  using 
the  PPDP  with  optimized  docking  protocol  and  different 
post-processing  approaches  such  as  clustering  and  MM- 
GBSA  scoring  are  ongoing. 


4.  Conclusions 

In  summary,  we  developed  an  efficient  protein - 
protein  docking  pipeline  that  integrates  a  variety  of 
protein  docking  programs  and  structural  refinement 
techniques  for  systematic  structural  prediction  of  protein 
complexes.  The  PPDP  program  has  been  used  for  studies 
of  large-scale  PPI  networks  of  biodefense-related 
organisms  at  the  BHSAI  and  supports  various 
biodefense-related  projects  sponsored  by  the  DoD.  We 
applied  the  PPDP  to  study  the  bacterial  virulence 
effector/chaperone  interaction.  As  a  preeminent 
example,  the  success  of  our  PPDP  approach  to  predict  the 
structural  “disordered-to-order”  transition  associated  with 
an  effector  protein  binding  to  its  chaperone  opens  up  the 
possibility  of  studying  the  underlying  binding  specificity 
of  chaperone/effector  interactions  and  devising  possible 
strategies  for  interfering  with  T3SS  transport. 
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